Enhancing LTP-Driven Cache Management Using Reuse Distance Information

نویسندگان

  • Wanli Liu
  • Donald Yeung
چکیده

Traditional caches employ the LRU management policy to drive replacement decisions. However, previous studies have shown LRU can perform significantly worse than the theoretical optimum, OPT [1]. To better match OPT, it is necessary to aggressively anticipate the future memory references performed in the cache. Recently, several researchers have tried to approximate OPT management by predicting last touch references [2, 3, 4, 5]. Existing last touch predictors (LTPs) either correlate last touch references with execution signatures, like instruction traces [3, 4] or last touch history [5], or they predict cache block life times based on reference [2] or cycle [6] counts. On a predicted last touch, the referenced cache block is marked for early eviction. This permits cache blocks lower in the LRU stack–but with shorter reuse distances–to remain in cache longer, resulting in additional cache hits. This paper investigates three mechanisms to improve LTP-driven cache management. First, we propose exploiting reuse distance information to increase LTP accuracy. Specifically, we correlate a memory reference’s last touch outcome with its global reuse distance history. Second, for LTPs, we also advocate selecting the most-recently-used LRU last touch block for eviction. We find an MRU victim selection policy evicts fewer LNO last touches [5] and mispredicted LRU last touches. Our results show that for an 8-way 1 MB L2 cache, a 54 KB RD-LTP which combines both mechanisms reduces the cache miss rate by 12.6% and 15.8% compared to LvP and AIP [2], two state-of-the-art last touch predictors, and by 9.3% compared to DIP [7], a recent insertion policy. Finally, we also propose predicting actual reuse distance values using reuse distance predictors (RDPs). An RDP is very similar to an RD-LTP except its predictor table stores exact reuse distance values instead of last touch outcomes. Because RDPs predict reuse distances, we can distinguish between LNO and OPT last touches more accurately. Our results show an 64 KB RDP can improve the miss rate compared to an RD-LTP by an additional 2.7%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Locality and Interleaving Information to Improve Shared Cache Performance

Title of dissertation: Using Locality and Interleaving Information to Improve Shared Cache Performance Wanli Liu, Doctor of Philosophy, 2009 Dissertation directed by: Professor Donald Yeung Department of Electrical and Computer Engineering The cache interference is found to play a critical role in optimizing cache allocation among concurrent threads for shared cache. Conventional LRU policy usu...

متن کامل

Understanding Unfulfilled Memory Reuse Potential in Scientific Applications

The potential for improving the performance of data-intensive scientific programs by enhancing data reuse in cache is substantial because CPUs are significantly faster than memory. Traditional performance tools typically collect or simulate cache miss counts or rates and attribute them at the function level. While such information identifies program scopes that suffer from poor data locality, i...

متن کامل

On the Theory and Potential of Collaborative Cache Management

The goal of cache management is to maximize data reuse. Collaborative caching provides an interface for software to communicate access information to hardware. In theory, it can obtain optimal cache performance. In this paper, we study a collaborative caching system that allows a program to choose different caching methods for its data. As an interface, it may be used in arbitrary ways, sometim...

متن کامل

Reuse-Aware Management for Last-Level Caches

Variability in generational behavior of cache blocks is a key challenge for cache management policies that aim to identify dead blocks as early and as accurately as possible to maximize cache efficiency. Existing management policies are limited by the metrics they use to identify dead blocks, leading to low coverage and/or low accuracy in the face of variability. In response, we introduce a new...

متن کامل

Studying the Impact of Multicore Processor Scaling on Cache Coherence Directories via Reuse Distance Analysis

Title of dissertation: Studying the Impact of Multicore Processor Scaling on Cache Coherence Directories via Reuse Distance Analysis Minshu Zhao, Doctor of Philosophy, 2015 Dissertation directed by: Professor Donald Yeung Department of Electrical and Computer Engineering Directories are one key part of a processor’s cache coherence hardware, and constitute one of the main bottlenecks in multico...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Instruction-Level Parallelism

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2009